I'm new to packaging and publishing on PyPI. I've been searching for information about the best practices about modules and namespaces, and using __init__.py to manipulate them. Yet I haven't been able to find a generally accepted approach.
Considering a package with multiple modules (and possibly sub-packages), there seems to be 3 different approaches:
-
Leave the
__init__.pyblank. This enforces explicit imports and thus clear namespaces. I've read Alex Martelli post in favor of this option on various questions at Stack Overflow. The cons are, the user of the package has to import seperate modules and call them with the dot notation. -
Import all modules in
__init__.py. The user doesn't have to do multiple imports. The cons are explicit vs implicit, and also as Martelli puts it (paraphrased), "if that's how your package works, maybe it should all go in a single module anyway". -
Import key functions from various modules directly into the package namespace. If you restructure modules, you still have the option to keep the same API for end users. Cons, it dirties the namespace, and very implicit / hacky.
I've been looking around at Github, and I've seen all 3 approaches in various projects. So I'm guessing there isn't really a consensus on what is the best practice.
I'm really interested in hearing from the pros, how they think about this, what they do, what they suggest... Or maybe there are subtleties that I'm missing... Any help is appreciated.
Thanks in advance
EDIT: Thanks a lot for all the great posts! They provide a lot to think about.
A couple of people asked about links, so here are a few I found to be useful:
http://guide.python-distribute.org/introduction.html
http://docs.python-guide.org/en/latest/writing/structure/
http://foobar.lu/wp/2012/05/13/a-comprehensive-step-through-python-packaging-a-k-a-setup-scripts/
(If you have links to other good guides to packaging, they'd be appreciated!)
And the Stack Overflow posts I mentioned:
http://stackoverflow.com/questions/1801878/the-pythonic-way-of-organizing-modules-and-packages
http://stackoverflow.com/questions/1944569/how-do-i-write-good-correct-init-py-files
http://stackoverflow.com/questions/2360724/in-python-what-exactly-does-import-import
I tend toward option 3 for my own work. For me, it's about declaring a 'public' API for the module, e.g.
stuff/ __init__.py bigstuff.py privateStuff.py
And __init__.py would have:
from bigstuff import Stuffinator, Stuffinatrix
This essentially says that stuff.Stuffinator and stuff.Stuffinatrix are the only parts of the module intended for public use. While there's nothing stopping people from doing an 'import stuff.bigstuff.Stuffometer' or 'import stuff.privateStuff.HiddenStuff', they'll at least know they're peeking behind the curtain at that point. Rather than being implicit, I find it's rather explicit.
also, flat is better than nested, really.
having to import stuff from various places to make basic uses of the api work is very much against python’s philosophy.
as for the .api subpackage: why not just use __init__.py? it’s less stuff to type and remember and feels like a pretty natural place for the “public API”.
You stick important parameters like version, author, release date, image DPI in there.
Don't forget __doc__ with decent sample usage.
Option 3 is definitely how this should work, here's why. You start out with your library having a package "foo" and a module "bar". Users make use of things inside of "bar" like, from foo.bar import x, y, z. Then one day, "bar" starts getting really big, the implementations for things become more complex and broken out, features are added. The way you deal with this is by making bar.py into a package, and the __init__.py inside of bar/ essentially replaces bar.py. Your users see no change in API, and there's no need for them to learn exactly which submodule inside the new bar package they need to use (nor should there be, as things can keep changing many more times - it wouldn't be correct to expose the userbase to each of those changes when it's entirely unnecessary).
There's nothing hacky about this at all, it's how __init__.py is meant to be used, and to those saying "explicit is better than implicit" I'd counter with "practicality beats purity" and "flat is better than nested", not to mention a foolish consistency is the hobgoblin of little minds.
I like importing key functions and classes. Flat is better than nested, so as a user of a library, I prefer from library import ThingIWant or import library and then using library.ThingIWant rather than from library.things.thing_i_want import ThingIWant.
More specifically, I would often have the contents of __init__.py be:
"""Docstring explaining package""" from thispackage.module_or_subpackage import * from thispackage.module_thats_next_alphabetically import * ...
And then have each module use __all__ to specify which names constitute its public API that should be exposed by the package.
For something you are distributing publicly, I don't think your __init__.py should ever by "blank" as specified by option 1: you should at least include a docstring explaining what the package does. This will help users poking around in ipython, etc.
Nothing. Explicit is better than implicit.
I recommend option 1 for most cases. We usually include an api.py module that puts in the convenience imports that one would otherwise put into the __init__.py. Many users of the foo.bar package will do from foo.bar.api import FooBar. Others that want more control will do from foo.bar.foo_bar import FooBar.
I'm personally a fan of how Werkzeug does things, and use it in my own projects. It allows for the the explicit import style of (1), but also allows (3) for convenience if you prefer that.
Well that is rather clever (what else would you expect from mitsuhiko) but unfortunately completely bamboozles code intelligence tools (IDEs, static analyzers and such)
Any chance you could post/add the links you found in regards to packaging and publishing? I'm new to it as well, and haven't found anything helpful so far
[removed]
Please don't use URL shorteners, they trigger the spam filter.
In my opinion, 2 and 3 are both abuses of __init__.py. On the other hand, characterizing option 1 as "leave it blank" is also too strict. The purpose of __init__.py, to my mind, is to hold shared initialization code. In support of that, it should contain side-effect-ish code that you want run before anything gets imported from the package. For simple, or even moderately complex cases, this can degenerate to "leave it blank". But when there's a win for sharing initialization code, it may be an ideal place.
Note that using __init__.pey in this way makes it directly analagous to a class's __init__() method, which I believe is exactly what the BDFL intended to imply when he named it.
I started a python project once that, in it's early days, used init.py to do some basic dependency checks, and print out useful instructions about what to do when critical modules were missing from your system. It seemed clever at the time, but I had to scrap it when I started collaborating with a debian packager, and he said that it was useful to be able to build the package on a system that didn't have the deps installed. Turned out that setup.py was importing the version string from the project itself, so you couldn't even run setup.py if the deps were missing, and it made packaging difficult.
So, that got scrapped and it's been empty in every project I've worked on since.